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Queuing  networks  can  be  used  to  model  maintained  systems . 

Under  many  conditions,  closed  queuing  network  theory  can  be  applied 
to  ascertain  the  availability  of  such  systems.  Multi-echelon 
repairable  item  inventory  systems  serve  as  one  such  class  of  examples. 
Problems  of  common  interest  to  the  reliability,  queuing,  and  inventory 
communities  are  highlighted,  and  solution  techniques  for  these  problems 
presented. 
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1.  Introduction 


The  major  purpose  of  this  paper  is  to  illustrate  a  class  of  prob¬ 
lems  which  are  of  mutual  interest  to  the  reliability,  queuing  and 
inventory  communities.  Although  often  separately  studied,  interests 
in  such  problems  really  are  common,  and  mutual  benefits  could  accrue 
by  interaction  among  these  communities. 

2.  A  Reliability  Problem 

In  Mann,  Schafer  and  Singpurwalla  (1974),  Section  10.3  deals 
with  reliability  models  for  maintained  systems.  In  particular,  Section 
10.3.1  gives  an  example  of  a  single  unit  which  fails  according  to  an 
exponential  distribution  with  mean  time  to  failure  (MTTF)  of,  say, 

1/X  and  is  repaired  (as  good  as  new)  according  to  an  exponential  dis¬ 
tribution  with  mean  time  to  repair  (MTTR)  of,  say,  1/p  .  This  process 
is  then  a  continuous  time  Markov  process  (CTMP)  and  is  driven  by  the 
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infinitesimal  generator  or,  as  it  is  also  often  called,  the  rate  matrix 


0  1 


0 


Q  = 


1 


-X  X 

V  -v 


(1) 


The  two  possible  system  states  are  0  (unit  is  operating)  or  1 
(unit  is  down,  and  undergoing  repair)  / 

We  desire  to  find  the  availability  of  the  unit  at  time  t  ,  which 
we  denote  as  A(t)  ,  and  to  do  this  we  need  to  find  £(t)  ,  the  state 
probability  (row)  vector  at  time  t  ,  that  is, 

£(t)  =  0>o(t),  p1(t)} 

and  hence 

A(t)  =  pQ(t)  . 

To  find  |>(t)  ,  we  must  solve  the  Kolmogorov  forward  equations 
(a  set  of  differential-difference  equations) 

£’  (t)  =  £(t)Q  ,  (2) 

with  the  added  condition  that  the  probabilities  sum  to  one,  namely, 

1  "  £(t)e  ,  (3) 

where  e  is  a  column  vector  of  l's  .  Thus  writing  out  (2)  and  (3)  we 
have 

Pj(t)  =  -XpQ(t)  +  yp1(t)  (4) 

p[(t)  =  Xp0(t)  -  yp1(t)  (5) 

1  =  P0(t)  +  Pi^^  »  (6) 

and  we  must  solve  the  set  of  equations  (4)  and  (6)  or  (5)  and  (6) .  This 

can  be  easily  done  using  Laplace  transforms  [we  employ  the  boundary 
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condition  Cp0^0^  =  1  >  P1(0)  =  0}  ;  that  is,  the  unit  is  working  at  time 
zero]  and  obtain 


A(t)  pQ(t)  x+y  A+y 


y  .  X  -(X+y) t 

—  +  i  . . .  e 


(7) 


Note  that  the  steady  state  availability  is 


A  =  lim  A(t)  = 
£->00 


X+y  ’ 


the  well  known  result  for  an  alternating  renewal  process . 


3.  An  Expanded  Reliability  Problem 

We  now  consider  an  expanded  version  of  the  problem  treated  in 
Section  2.  Consider  now  N  units  and  c  repair  channels  (c  <  N)  . 

We  now  define  A(t)  as  the  probability  that  at  least  some  desired  number, 
say  M  ,  of  the  units  is  operational  at  time  t  .  If  more  than  M  are 
operational,  the  excess  are  considered  spares  and  are  on  cold  standby 
(note  that  there  are  a  total  of  N-M  =  y  spares  in  the  system,  but  that 
all  y  spares  are  not  always  available) .  If  less  than  M  units  are 
operational,  the  system  is  performing  below  the  desired  level. 

A  system  state  can  be  described  by  the  number  of  units  up  (or 
operating,  call  this  n^  )  or  by  the  number  of  units  in  or  awaiting  re¬ 
pair  (call  this  n^  ) .  Either  state  descriptor  gives  complete  informa¬ 
tion  since  n^  +  n^  =  N  .  For  this  problem,  the  Q  matrix  is  N+l  x  N+l 
as  there  are  a  total  of  N+l  states:  0,1,2,...,N  .  Hence  it  is  neces¬ 
sary  to  solve  a  set  of  N+l  linear,  first-order  differential  equations 
of  the  type  given  by  (2) . 
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4.  A  Queuing  Problem 

The  above  "reliability  problem"  is  also  a  "classical"  problem  in 
queuing  theory  and  is  known  as  the  machine  repair  problem  [see  Cooper 
(1981,  Section  3.8),  Kleinrock  (1974,  Section  3.8),  or  Gross  and  Harris 
(1974,  Section  3.6)].  Figure  1  shows  a  schematic  of  this  problem,  mod¬ 
eled  as  a  closed  queuing  network.  This  is  a  two  node,  closed  queuing 
network,  where  the  total  of  N  units  are,  at  various  times  and  in 
various  combinations,  distributed  among  the  two  nodes.  At  the  operating 
node,  we  show  M  parallel  service  channels  so  that  a  queue  at  this  node 
represents  the  cold  standby  spares  available. 

5.  A  Repairable  Item  Inventory  Problem 

The  problem  discussed  in  Sections  3  and  4  also  fits  the  category 
of  an  inventory  problem.  It  is  a  typical  "repairable  item  inventory 
problem,"  for  which  it  is  desired  to  find  the  optimal  combination  of  the 
numbers  of  spares  and  repair  channels,  so  as  to  satisfy  certain  service 


Operating  Repair 
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level  performance  criteria.  Thus  the  problem  mathematically  is  to  find 
y  and  c  which 

Minimize  E[Cost/Year]  =  k^y  +  k^c 

Y 

subject  to  A ( t  )  E  £  p  (t.)  >  1  -  a  (i  =  1,2,...,T) 

n=0  n  1 

N 

L (t  )  E  l  np  (t  )  <  C  (i  =  1,2 , . . . ,T)  , 

1  n=0  n  1 


where  is  the  probability  that  n  units  are  at  the  repair  node, 

is  the  annual  cost  associated  with  having  a  spare  (amortization  of 
purchase  cost  including  interest,  insurance,  storage,  etc.),  k^  is  the 
annual  cost  associated  with  each  repair  channel  (amortization,  salary  of 
repair  crew,  maintenance  of  repair  equipment,  etc.),  1  -  a  is  the 
desired  availability,  and  £  is  the  desired  limit  on  the  average  number 
of  units, in  or  awaiting  repair.  The  dots  represent  other  constraints 
that  may  possibly  be  imposed,  for  example,  a  constraint  on  total 
budget . 

In  order  to  solve  this  problem,  it  is  necessary  first  to  find 
p(t)  ,  and  this  is  what  we  focus  our  attention  on  here.  We  refer  the 
reader  to  Gross,  Miller  and  Soland  (1983)  for  a  discussion  of  the  opti¬ 
mization  aspects  of  such  problems. 

Consider  a  more  complex  multi-echelon  version  of  the  above 
problem  as  shown  in  Figure  2.  Pictured  here  are  three  "field"  locations, 
each  with  local  repair  capability.  However,  depending  on  the  problem 
causing  the  failure,  a  certain  percentage  of  failed  units  must  be  sent 
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Figure  2.  Multi-echelon  repairable  item 
inventory  system. 


to  a  higher  echelon  (depot)  to  be  repaired.  Each  field  location,  as 
well  as  the  depot,  stocks  spare  units  which,  if  available,  are  dis¬ 
patched  from  the  location  to  which  the  failed  unit  is  sent.  If  spares 
are  not  available,  requests  are  backordered. 

As  long  as  all  failure  and  repair  times  are  exponential,  we  still 
have  a  CTMP,  albeit  with  a  very  large  (but  finite)  Q  matrix.  For 
example,  Table  1  shows  a  specific  example  which  yields  a  state  space 
of  over  100  million  states. 

For  such  systems,  shown  in  Figure  2,  we  might  desire  A^(t)  , 
A2(t)  ,  A3(t)  and  A123(t)  ,  where  A^t)  is  the  probability  that 
units  are  operating  at  field  location  i  at  time  t  (i  =  1,2,3)  and 
A123 ( is  the  probability  that  at  time  t  ,  are  operating  at 

location  1,  M2  at  location  2,  and  at  location  3  simultaneously. 

In  the  example  given  in  Table  1,  M  =  25  ,  i  =  1,2,3  . 
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TABLE  1 

THREE  LOCATION,  TWO  ECHELON  EXAMPLE 


Location 

N 

M 

C 

i 

29 

25 

3 

2 

29 

25 

3 

3 

29 

25 

3 

Depot 

6 

— 

4 

Number  of 

states  =  |S?| 

=  100, 

706,625 

6.  Solution  Techniques 

Obviously,  for  large  systems,  the  use  of  Laplace  transforms  for 
obtaining  p(t)  is  not  feasible.  Since  our  systems  are  finite,  numeri¬ 
cal  methods  can  be  utilized.  Numerical  integration  techniques  such  as 
Runge-Kutta  or  predictor-corrector  can  be  employed  for  moderately  sized 
systems.  We  found  for  these  types  of  problems,  another  method  which  we 
refer  to  as  randomization  to  be  more  efficient.  For  details  of  the 
development  of  this  procedure,  we  refer  the  reader  to  Grassmann  (1977a 
and  1977b),  or  to  Gross  and  Miller  (1984a,  1984b).  The  randomization 
method,  as  far  as  we  can  ascertain,  dates  back  at  least  to  a  paper  by 
Jensen  (1953),  and  is  mentioned,  often  under  other  names  (for  example, 
subordination  of  Markov  chains  to  Poisson  processes  or  uniformized 
embedded  Markov  chains),  by  Cohen  (1969),  Feller  (1971),  ginlar  (1975), 
Keilson  and  Kester  (1977),  and  Keilson  (1979),  to  mention  a  few. 
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The  basic  idea  of  the  randomization  technique  is  to  view  the  CTMP 

in  a  certain  way,  which  allows  the  major  computation  to  be  performed  on 

an  imbedded  discrete  time  Markov  chain  (DTMC)  called  the  un'lfovm'ized 

chain.  The  transitions  for  this  DTMC  are  generated  by  an  underlying 

Poisson  process  (hence  the  name  randomization) .  The  single-step 

transition  probability  matrix  of  the  DTMC  and  the  parameter  (rate)  of 

the  Poisson  process  are  functions  of  the  original  rate  matrix,  Q  = 

{q . . }  ,  of  the  CTMP . 
ij 

Let 


A  =  max 
i 


and 


P  =  Q/A  +  I  . 

Then  the  imbedded  uniformized  DTMC  has  single-step  transition  probability 
matrix  P  and  the  transitions  of  this  DTMC  are  generated  by  a  Poisson 
process  with  rate  A  .  Note  that  since  the  diagonal  elements  of  Q 
are  negative,  that  is, 


Li 


3 

(i^j) 


A  is  actually  the  absolute  value  of  the  minimum  diagonal  element,  which 
is  the  mean  exit  rate  of  the  state  with  the  largest  mean  exit  rate. 

Denoting  the  state  probability  vector  after  k  transitions  of 
the  DTMC  by  j>(k)  ,  it  can  be  shown  [see  Gross  and  Miller  (1984a)]  that 


p.(t)  =  l  (p  (k) 


k=0 


(At)k  e  At 

k! 


(8) 


where  p^.  (t)  is  the  probability  that  the  CTMP  is  in  state  j  at  time 
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t  (jth  element  of  p(t)  ),  $ . (k)  is  the  probability  that  the  imbedded 

-  J 

uniformized  DTMC  is  in  state  j  after  k  transitions  (jth  element 

1c  “-At 

of  j>(k)  )  and  (At)  e  /k!  is  the  probability  of  k  transitions  of 

the  DTMC  in  clock  time  t  .  The  usual  recursion  can  be  used  to  get 
(k)  ,  that  is, 

£(0)  =  p(0)  ;  $(k)  =  j>(k-l)P  .  (9) 

To  use  (8)  for  computational  purposes,  the  infinite  sum  must  be 
truncated.  The  error  of  truncation  can  be  nicely  bounded  since  we  are 
discarding  a  Poisson  tail;  and,  in  fact,  the  computing  version  of  (8) 
becomes 


Pj  ^ 


T(e,t) 

l 

k=0 


fj(k) 


-At 

e 


(10) 


where 


T(£,t)  (At)k  e  At  ^ 

l  - — - ^  l  -  e 


k=0 


k! 


(11) 


£  being  the  desired  error  bound.  One  advantage  of  this  method  over 
numerical  integration  (besides  efficiency)  is  the  ability  to  exactly 
bound  the  computational  error. 


7.  Results 

The  largest  problem  solved  directly  by  the  procedure  described 
in  the  previous  section  is  shown  in  Table  2.  This  example  is  a  two 
field  location,  two  echelon  system  with  a  state  space  size  of  20,748. 
Calculated  were  A][(t)  ,  A2(t)  ,  A12(t)  ,  t  =  1,2,..., 15  ,  with  the 
following  time-varying  scenario.  At  time  t  =  6  ,  a  sudden  decrease 
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TABLE  2 

TWO  LOCATION,  TWO 
ECHELON  EXAMPLE 


Location 

N 

y 

c 

i 

18 

4 

2 

2 

13 

2 

2 

Depot 

— 

3 

4 

£  =  .001 

\$\  =  20,748 


of  MTTF  occurs.  The  repair  facilities  cannot  make  an  "in  kind"  accommo¬ 
dation  until  time  10  .  Figure  3  shows  a  plot  of  A^(t)  versus  t 
[ A2 ( t)  and  A^ (t)  are  similar  in  nature].  The  graph  shows  an  initial 
A^(0)  of  1.0  (we  assume  at  time  zero  all  units  are  operational)  and 
thereafter  a  drop-off  toward  the  steady-state  availability  as  time  in¬ 
creases.  At  time  6  ,  the  increase  in  failure  rate  occurs  and  A(t) 
begins  to  drop  off  at  an  increasing  rate,  heading  for  a  new,  lower 
steady-state  availability.  However,  the  increase  in  repair  rate  at 
time  10  causes  A(t)  to  begin  to  rise,  heading  back  toward  the  origi¬ 
nal  steady-state  availability. 

This  run  took  approximately  25  minutes  of  CPU  time  on  a  VAX 
11/780  computer  using  the  randomization  computation  of  (10)  with  a  more 
efficient  procedure  than  the  recursion  of  (9)  [given  in  Gross  and  Miller 
(1984a)]  for  calculating  tj)(k)  . 
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A(t) 

1.0 


0.8 


0.6 


0.4 


0.2 


0.0  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0  10.0  11.0  12.0  13.0  t 

Figure  3.  A^(t)  versus  t  for  sample  run. 


-  11  - 


T-491 


As  the  systems  become  more  complex  (more  bases,  multiple  component 
types,  indenture,  more  echelons,  etc.)  the  state-space  grows  rapidly. 

We  have  solved  a  three  location  problem,  shown  in  Table  3,  using  a 
truncated  state-space  approach,  where  seldom  visited  states  are  "lumped" 
together  in  single  absorbing  states  [see  Gross,  Kioussis,  and  Miller 
(1984)].  There  are  over  43  million  states,  but  via  the  truncation  ap¬ 
proach,  the  state-space  was  reduced  to  23,410  and  solved  in  about  30 
minutes  CPU  time  on  the  VAX  11/780,  adding  an  error  of  .007  .  Calcu¬ 
lated  were  A^t)  ,  A2(t)  ,  A^(t)  and  ^23^  >  for  t  =  1,2,..., 15  . 


TABLE  3 

THREE  LOCATION,  TWO 
ECHELON  EXAMPLE 


Location 

N 

y 

c 

i 

25 

i 

1 

2 

24 

i 

1 

3 

24 

i 

1 

Depot 

— 

2 

1 

£ 

=  .001 

!sl  = 

43,278, 

703 
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8.  Conclusions 

We  have  presented  here  a  class  of  problems  of  interest  to  the 
reliability,  queuing  and  inventory  communities  and  briefly  demonstrated 
a  viable  solution  procedure  for  these  problems.  While  researchers  in 
the  above  communities  often  go  their  "separate  ways,"  better  communica¬ 
tion  among  them  should  benefit  all. 
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To  cope  with  the  expanding  technology,  our  society  must 
be  assured  of  a  continuing  supply  of  rigorously  trained 
and  educated  engineers.  The  School  of  Engineering  and 
Applied  Science  is  completely  committed  to  this  ob¬ 
jective. 


